Clustering Gene Expression Data Using an Effective Dissimilarity Measure
نویسندگان
چکیده
This paper presents two clustering methods: the first one uses a density-based approach (DGC) and the second one uses a frequent itemset mining approach (FINN). DGC uses regulation information as well as order preserving ranking for identifying relevant clusters in gene expression data. FINN exploits the frequent itemsets and uses a nearest neighbour approach for clustering gene sets. Both the methods use a novel dissimilarity measure discussed in the paper. The clustering methods were experimented in light of reallife datasets and the methods have been established to perform satisfactorily. The methods were also compared with some wellknown clustering algorithms and found to perform well in terms of homogeneity, silhouette and the z -score cluster validity measure.
منابع مشابه
خوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملClustering Gene Expression Data Using an Effective Dissimilarity Measure1
This paper presents two clustering methods: the first one uses a density-based approach (DGC) and the second one uses a frequent itemset mining approach (FINN). DGC uses regulation information as well as order preserving ranking for identifying relevant clusters in gene expression data. FINN exploits the frequent itemsets and uses a nearest neighbour approach for clustering gene sets. Both the ...
متن کاملA new approach for clustering gene expression time series data
Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. This paper proposes a suitable dissimilarity measure for gene expression time series data sets. It also presents a graph-based clustering method for findi...
متن کاملIncorporating heterogeneous biological data sources in clustering gene expression data
In this paper, a similarity measure between genes with protein-protein interactions is proposed. The chip-chip data are converted into the same form of gene expression data with pearson correlation as its similarity measure. On the basis of the similarity measures of proteinprotein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure ...
متن کامل